

**IN THE CLAIMS**

Following are the claims as currently pending for consideration.

1. (Previously Presented) A computer system comprising:
  - a processor; and
  - a storage device coupled to the processor and having stored therein an instruction, which when executed by the processor, causes the processor to at least,
    - access a packed data operand having at least two portions of data elements;
    - select a set of data elements from any data elements in a portion of the packed data operand, the portion including at least two data elements;
    - copy each data element of the selected set of data elements to any specified data fields located in the corresponding portion of a destination operand.
2. (Original) The computer system of claim 1, wherein the packed data operand includes eight data elements and the processor selects a set of data elements from one of either the upper half or the lower half of the packed data operand.
3. (Original) The computer system of claim 1, wherein the storage device further comprises a packing device for packing integer data into the data elements.
4. (Original) The computer system of claim 1, wherein the data elements are 16-bit data elements and wherein the data packed and destination operands are each 128-bit operands.

5. (Original) The computer system of claim 1, wherein the data packed and destination operands are the same operand.
6. (Previously Presented) A computer-implemented method comprising:
  - decoding a single instruction;
  - in response to decoding the single instruction, accessing a packed data operand including at least two portions of data elements;
  - selecting a set of data elements from any data elements in a portion of the packed data operand, the portion including at least two data elements;
  - copying each data element of the selected set of data elements to any specified data fields located in the corresponding portion of a destination operand.
7. (Original) The method of claim 6, wherein accessing a packed data operand including eight data elements and selecting a set of data elements from one of either the upper half or the lower half of the packed data operand.
8. (Original) The method of claim 6, further comprising packing integer data into the data elements.
9. (Original) The method of claim 6, wherein the data elements are 16-bit data elements and wherein the data packed and destination operands are each 128-bit operands.

10. (Original) The method of claim 6, wherein the data packed and destination operands are the same operand.

11. (Previously Presented) A computer-implemented method comprising:  
accessing data representative of a first three-dimensional image;  
altering the data using three-dimensional geometry to generate a second three-dimensional image, the method of altering at least including,  
accessing a packed data operand having at least two portions of data elements;  
selecting a set of data elements from any data elements in a portion of the packed data operand, the portion including at least two data elements;  
copying each data element of the selected set of data elements to any specified data fields located in the corresponding portion of a destination operand; and  
displaying the second three-dimensional image.

12. (Original) The method of claim 11, wherein the method of altering includes the performance of a three-dimensional transformation.

13. (Original) The method of claim 11, wherein accessing a packed data operand including eight data elements and selecting a set of data elements from one of either the upper half or the lower half of the packed data operand.

14. (Original) The method of claim 11, wherein the method of altering includes packing integer data into the data elements.

15. (Original) The method of claim 11, wherein the data elements are 16-bit data elements and wherein the data packed and destination operands are each 128-bit operands.

16. (Original) The method of claim 11, wherein the data packed and destination operands are the same operand.

17-22. (Cancelled)

23. (Previously Presented) A program loaded into a computer readable medium comprising:

a computer readable code to access a packed data operand having at least two portions of data elements;

a computer readable code to select a portion of the packed data operand, the portion including at least two data elements;

a computer readable code to select a set of data elements from any data elements in the selected portion;

a computer readable code to copy each data element of the selected set of data elements to any specified data fields located in the corresponding portion of a destination operand.

24. (Original) The program of claim 23, wherein the source operand and the destination operand are the same operand.

25. (Original) The program of claim 23, wherein  $m = 2$  and the computer readable code to select a set of data elements selects one of either the upper half or the lower half of the packed data operand.

26– 30. (Cancelled)

31. (Previously Presented) A processor-implemented method responsive to a single instruction, the method comprising:  
accessing a source register including at least two portions of data elements;  
copying each data element of a selected set of data elements from any data elements in a portion of the source register, said portion including at least two data elements, to any specified data fields located in a corresponding portion of a destination register.

32. (Previously Presented) The method of claim 31, wherein the source register of 128 bits is the same register as the destination register, and the single instruction has a control word of eight bits to specify 16-bit data fields located in the corresponding portion of the destination register.

.33. (Previously Presented) A processor comprising:

a decoder to decode:

    a first instruction specifying a first source operand, a first destination operand,

    a second instruction specifying a second source operand, a second destination operand, and

    a third instruction specifying a third source operand, a third destination operand; and

    an execution unit, responsive to the first instruction, to copy each data element of a selected first set of data elements from any data elements in a first portion including at least two data elements of the first source operand to any specified data fields located in a corresponding first portion of the first destination operand; responsive to the second instruction, to copy each data element of a selected second set of data elements from any data elements in a second portion including at least two data elements of the second source operand to any specified data fields located in a corresponding second portion of the second destination operand; and responsive to the third instruction, to copy each data element of a selected third set of data elements from any data elements in a third portion including at least two data elements of the third source operand to any specified data fields located in a corresponding third portion of the third destination operand.

34. (Previously Presented) The processor of claim 33, wherein the first instruction is to copy 16-bit data elements from the first source register of 128 bits to the first

destination register of 128 bits, the second instruction is to copy data elements from the upper half of the second source register to the upper half of the second destination register, and the third instruction is to copy data elements from the lower half of the third source register to the lower half of the third destination register.

35. (Previously Presented) The processor of claim 33, wherein the processor is comprised of either or both hardware and software components.

36. (Previously Presented) A method for shuffling packed data elements comprising:  
decoding a single instruction;  
in response to decoding the single instruction, accessing a source register having a packed data operand including at least two portions of data elements;  
selecting a set of data elements from any data elements in a portion of the packed data operand, the portion including at least two data elements; and  
copying each data element of the selected set of data elements to any specified data fields located in the corresponding portion of a destination register.

37. (Previously Presented) The method of claim 36, wherein the portion is one of either the upper half or the lower half of both the source and the destination registers.

38. (Previously Presented) The method of claim 36, wherein the data fields located in the corresponding portion of a destination register are specified by a field of control bits of the single instruction.

39. (Previously Presented) A processor comprising:

a decoder to decode at least one single instruction; and  
an execution unit, responsive to the at least one single instruction, to copy each data element of a set of data elements from any data elements in a portion of a source register, the portion including at least two data elements, to any specified data fields located in a corresponding portion of a destination register.

40. (Previously Presented) The processor of claim 39, wherein the portion is one of either the upper half or the lower half of the source and destination registers.

41. (Previously Presented) The processor of claim 39, wherein the decoder is to decode a first instruction to shuffle 16-bit data elements from a first source register of 128 bits to a first destination register of 128 bits, a second instruction to shuffle data elements from the upper half of a second source register to the upper half of a second destination register, and a third instruction to shuffle data elements from the lower half of a third source register to the lower half of a third destination register.